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ABSTRACT 

Higher-order confirmatory factor models positing one, 
two, and three higher-order factors were tested using class-averaged 
responses to the student rating instrument, Students' Evaluations of 
Educational Quality (SEEQ) , developed by Marsh (1987). Three 
.higher-order factors, Presenter, Rapport, and Regulator, were 
consistent across a data sample of over 6,322 classes of 
undergraduates in representing 8 distinct SEEQ first-order factors. 
The three higher-order factors were found to be stable across classes 
that were different in terms of academic discipline (Social Science, 
Business, Engineering) and instructor level (Full Professor, 
Associate Professor, Assistant Professor). The study results 
supported the three higher-order factors as potential composite 
measures of college instruction for practical purposes in faculty 
teaching evaluation. Four tables and one figure present data; an 
appendix presents the survey form. (Author/SLD) 
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Abstract 

Higher -order confirmatory factor models positing one, two, 
and thre^ higher-order factors were tested using class- 
averaged responses to the student rating instrument 
developed by Marsh (1987) Students' Evaluations of 
Educational Quality (SEEQ) . Three higher-order factors 
Presenter, Rapport, and Regulator were consistent across a 
data sample of over. 6, 322 classes in representing eight 
distinct SEEQ first-order factors. The three higher-order 
factors were found stable across classes different in terms 
of academic discipline (Social Science, Business, 
Engineering) and instructor level (Full Professor, Associate 
Professor, Assistant Professor) . The study results 
supported the three higher-order factors as being potential 
composite measures of college instruction for practical 
purposes in faculty teaching evaluation. 
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Student ratings have been widely used as a measure of 
teaching effectiveness in universities and colleges. 
Student ratings of their instructors and courses have gained 
widespread acceptance over many other available evaluation 
methods such as those of faculty self -evaluation, ratings by 
former students, peer reviews, and judgments of trained 
observers. Student ratings continue to be popular. A 
recent survey of 600 liberal arts colleges by Seldin (1993) 
reported that the use of student ratings in these colleges 
has increased from 29 percent .n 1973 to 68 percent in 1983 
and to 86 percent in 1993 . In spite of its popularity, the 
use of student evaluation for summative purposes in 
personnel decision involving salary, tenure, and promotion 
has not found general agreement. 

Many researchers (e.g., Abrami, 1989; Braskamp, 
Brandenburg, & Ory, 1984; Centra, 1977; Doyle & Whitely, 
1974) have favored the use of global student ratings 
(overall instructor effectiveness or overall course 
effectiveness) for personnel purposes. Abrami (1989) argued 
that teaching is a unitary construct and student ratings of 
teaching effectiveness should be represented by a single or 
global index. Braskamp et al . (1984) suggested using 
global, high inference rating items for personnel decisions 
and specific, low inference rating items for diagnostic, 
feedback and other non-personnel related purposes. Centra 
(1977) and Doyle and Whitely (1974) endorsed the use of 
global ratings for faculty tenure and promotion decisions to 
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the extent that the global ratings are valid criteria of 
instructional effectiveness and bear a moderate relationship 
with student learning. Other researchers (e.g., Feldman, 
1976; Kulik & Kulik, 1974; Marsh, 1987; McKeachie, Lin, & 
Mann, 1971) have supported the view that students 1 
evaluations of teaching effectiveness are multidimensional. 
Marsh (1987) has argued that teaching is multif aceted, e.g., 
a teacher might be well -organized bat lack enthusiasm. 
Student ratings, like the teaching they represent, should be 
multidimensional. According to this view, any instrument 
that focuses on a single aspect of teaching is lively to be 
inadequate (e.g., Barnes & Barnes, 1903; Murray, Rushton, & 
Paunomen, 19 90) . 

In a survey of experts in student evaluation, Johnson 
(1989) found an evenly split opinion from the experts 
concerning the use of student ratings for personnel 
decisions. In faculty evaluation, administrative committees 
commonly decide the quality of a faculty teaching 
effectiveness on a single continuum from poor to excellent. 
This common practice has raised some concerns that 
administrative committees, unlike researchers, are not well 
trained to interpret student evaluation data presented in a 
profile of multiple scores. The common practice of 
personnel committees makes it more desirable to summarize 
student information into a single or fewer composite scores. 
Several methodological alternatives for summarizing multiple 
scores have been suggested in the literature. One 
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alternative is to derive differential weight for each score 
in a multidimensional profile such that an overall weighted 
average score can be obtained (Abrami, 1985; Marsh, 1991). 
This overall weighted score can then be used as a single 
index for teaching effectiveness. Another alternative is to 
use factor analysis to probe the possibility of higher-order 
factors which are defined as a composite of two or more 
first-order factors. Higher-order factors are potentially 
more stable constructs and easier to interprete than the 
multitude of first-order factors. As a composite, higher- 
order factors give insights into the structure of latent 
variables which are normally not available with first-order 
factors. 

The purpose of this study is to illustrate the 
methodological alternative of confirmatory factor analysis 
(CFA) in testing higher-order factors of a multidimensional 
rating instrument that was developed using Marsh's (1987) 
Students' Evaluations of Educational Quality (SEEQ) . A 
comprehensive review of the research that led to the design 
of the SEEQ survey has been summarized by Marsh (1987) . 
SEEQ is an evaluation instrument designed to measure the 
multiple aspects of teaching effectiveness at the university 
level or in the college classroom. Numerous studies using 
exploratory factor analysis (e.g., Marsh, 1991; Marsh & 
Hocevar, 1991) have shown that responses to the SEEQ 
instrument were consistent in representing rine distinct 
factors of teaching effectiveness: Learning /Value, 



8 



4 



Instructor Enthusiasm, Organization/Clariry , Breadth of 
Coverage, Group Interaction, Individual Rapport, 
Examinations/Gradings , Assignments/Readings , and 
Workload/Difficulty. These so-called SEEQ first-order 
factors are known to be highly correlated. The correlations 
cf these first-order factors can in turn be factor analyzed 
and the resulting factors would be; termed "second-order 
factors". Second-order factor analysis has not been 
frequently applied and is not widely known and understood 
(Thompson & Borreilo, 1992) . 

The first higher-order analysis of the SEEQ instrument 
was conducted by Marsh (19 91) using responses to the 
instrument survey from 500 classes in the Social Science 
Division. Four higher-order models positing one, two, 
three, and four second-order factors were hypothesized and 
tested using the covariances of nine SEEQ first-order 
factors. The model with four second-order factors was shown 
to fit the data better and explain about 75% of the variance 
in the first-order factors. The four second-order factors 
identified by Marsh (1991) with their cluster of first-order 
factors were: Presenter (Learning/Value, Instructor 
Enthusiam, Organization/Clarity, Breadth of Coverage) , 
Rapport (Group Interaction, Individual Rapport) , Course 
Materials (Examinations/Gradings, Assignments/Readings) , and 
Workload (Assignments/Readings, Workload/Difficulty) . 
Another higher-order factor analysis of the SEEQ responses 
was performed by Vogt and Hocevar (1993) with a sample of 
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over 15,000 classes in six academic disciplines 
(Communication, Journalism, Business, Social Science, 
Engineering, and Political Science) . Across the six 
academic disciplines two second-order factors were used to 
summarize five first-order factors with Learning/Value, 
Organization/Clarity, Breadth of Coverage forming the first 
second-order factor and Group Interaction and Individual 
Rapport forming the second-order factor. The two second- 
order factors identified by Vogt and Hocevar (1993) 
exhibited similar pattern with Marsh's (1991) two second- 
order factors: Presenter and Rapport. These two factors 
Presenter and Rapport have been consistently identified as 
dominant characteristics of good teaching (e.g., Bendig, 
1953/ Creager, 1950; Finkbeiner et al . , 1973; Frey, 1978; 
Hartley & Hogan, 1972; Isaacson et al . , 1964). The factor 
Presenter reflects the overall ability of the instructor in 
stimulating student learning through skillful presentation 
of materials, broad coverage of subject matter, and clarity 
in organizing his/her course. The factor Rapport is equally 
well supported in the literature. The interaction of the 
instructor with students and his/her personal attitude 
toward students constitutes an important characteristic for 
effective teaching . 

In deriving higher-order factors, Marsh (1991) and Vogt 
and Hocevar (1993) differed in their analysis of the SEEQ 
rating items. While Marsh's (1991) incorporated all 35 SEEQ 
items in his higher-order models, Vogt and Hocevar (19 93) 
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used only 20 of the 35 items. The first-order factors that 
were excluded by Vogt and Hocevar (1993) had exhibited 
patterns of inconsistent loading on second-order factors in 
a preliminary analysis. Specifically, according to Vogt and 
Hocevar (1993), factors relating to the instructor or 
controlled by the instructor (presentation skill, course 
organization, individual rapport, group interaction, 
instructor enthusiasm, breadth of coverage) were stable 
components for two higher-order factors. Factors that were 
perceived as partially related to the instructor's ability 
and influence in the classroom (examinations, assignments, 
workload difficulty) were not stable components for the 
positing higher-order factors, 

Method 

Sample 

The sample for this study was obtained from responses 
to the SEEQ survey instrument from approximately 7,407 
undergraduate classes at a large private university in the 
United States between 1980 and 1990. Classes with 
incomplete responses and fewer than ten students were 
excluded from the data analysis. The final sample for the 
study consisted of 6,322 classes with the unit of analysis 
being the class-average ratings across all students in the 
same class. Classes were further divided into separate 
subgroups according to academic division and instructor 
rank. Three academic subgroups (Social Science, Business, 
Engineering) and three instructor subgroups (Assistant 
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Professor, Associate Professor, Full Professor) were 
constructed from the total sample (Table 1) . 



Insert Table 1 about here 



Measures 

The SEEQ has 3 5 rating items with scales of "1= Very 
Poor" to "5= Very Good" (Appendix A) . Clusters of these 
items are expected to load on the nine factors as following: 
Learning/Value (item 1-4) , Instructor Enthusiasm (item 5-8) , 
Organization/Clarity (item 9-12), Group Interaction (item 
13-16), Individual Rapport (item 17-20), Breadth of Coverage 
(item 21-24), Examinations/Grading (item 25-27), 
Assignments/Readings (item 28-29) , and Workload/Difficulty 
(item 32-35). The form has two global rating items with 
item 30 measuring overall course effectiveness and item 31 
measuring overall instructor effectiveness. 

To evaluate the loadings of the measures on the 
separate first-order factors, a nine-factor measurement 
model with all 35 items was estimated from the sample of the 
total group. The results showed that the interf actor 
correlations were all high as expected except those for the 
Workload/Difficulty factor. The mean correlation betvreen 
Workload/Difficulty and all other factors was .130 while the 
mean correlation of all eight factors together was .758. 
Based on this finding, the measurement model was re- 
pstimated with 31 items excluding the Workload/Difficulty 
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factor. The mean correlation for all eight factors in the 
re -estimated model remained the same (.759) . Thus, the 
removal of the Workload/Difficulty factor had no residual 
effect on the intercorrelat ion of the remaining factors in 
the measurement model. 

The reduced eight -factor measurement model was once 
again re-estimated but this time with only 29 items. The 
global item 30 was prevented from loading on factor 
Learning/Value and global item 31 from loading on factor 
Instructor Enthusiasm. Without the two global items, the 
mean interf actor correlation dropped slightly to .743 from 
.759. This 2% decrease in interf actor correlation indicated 
that the global items had only negligible unique ef f »ct in 
the measurement of the first-order factors. From this 
evidence, the more parsimonious eight-factor model with 29 
measured items was adapted as the final measurement model 
for testing higher-order factors, 

The model for higher-order factors is a LISREL 
structural submodel 3A (Joreskog & Sorbom, 1989) . The 
structural t Ddel is a second-order factor analysis model 
which simultaneously estimates the measurement of the latent 
variables and their structural relationship to each other. 
For model specification the following matrixes are required: 
LAMDA Y as the matrix of first-order factor loadings, PSI as 
the matrix of first-order factor variance -covariances , GAMMA 
as the matrix of second-order factor loadings, PHI as the 
matrix of second-order factor variance-covariances, and 
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THETA EPSILON as the matrix of error /uniquenesses in 
measurement. Seven 29x29 sample covariance matrixes - one 
for the total group and one for each of the six subgroups - 
were the basir for estimating of higher-order factors. 

Results and Discussion 
SEEP Measurement: Model 

A prerequisite for higher -order analysis is the 
adequacy of the measurement model which represents the 
measured portion of the total model. If the measurement of 
the first -order factors is weak or inadequate then the 
higher-order factors which are hypothesized to represent 
these first-order factors would be inconsequential. The fit 
of the measurement model provides an indication of how well 
first -order factors are represented by the sample data. A 
number of fit indices are available in the LISREL output: 
chi-square , goodness-of -f it index (GFI) , adjusted 

goodness-of -f it index (AGFI) . Two additional fit indices 
were included in the model fit assessment: the Bentler and 
Bonett's normed fit index (NFI) and the Tucker-Lewis index 
(TLI) (Gerbing & Anderson, 1992) . Results of these fit 
indices are presented in Table 2. 



Insert Table 2 about here 



On the basis of the null model, the SEEQ eight-factor 
measurement model represented an substantial improvement in 
incremental fit. Across the total group and all six 
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subgroups the NFI index varied from .894 to .912 and the TLI 
index from .880 to .902. Together, these two indices 
suggested that the eight- factor model provide an acceptable 
fit to the sample data. Thu equally strong fit for the 
total group as well as for each of the six subgroups 
demonstrated that the SEEQ first-order factors were 
generalizable across classroom conditions differing in terms 
of academic discipline and instructor level. 
SEEP Higher-Order Factor Models 

Three higher-order factor models were tested in this 
study. The first model posited a global factor in which all 
eight first-order factors were constrained to load on one 
single second-order factor. The second model posited two 
higher-order factors similar to Marsh's (1991) two second- 
order Skill and Rapport factors. The third model posited 
three higher-order factors similar to Marsh's (1991) three 
second-order factor model Piesenter, Rapport, and Regulator. 
Existing theory and knowledge in student evaluation research 
were the basis for postulating these higher-order models and 
v/ere briefly reviewed by Marsh (1991) . Each of the three 
higher-order models was estimated using samples fr m the 
total group and from each of the six subgroups. Two 
goodness-of -f it indices are used to assess the fit of the 
higher-order models: the Tucker Lewis index (TLI) and the 
Relative Noncentrality index (RNI) (Table 3) . 
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Insert Table 3 about here 



The fit of the model with three higher-order factors 
was clearly better than the fit for the other two higher- 
order models. Loadings on the three higher-order factors 
were consistently high. The total group had a mean loading 
of .891. Among the six subgroups, the business subgroup 
showed the lowest mean loading (.886) and the associate 
professor subgroup the highest mean loading (.895). These 
high factor loadings confirmed the stability of the factor 
structure underlying the three higher-order factors. The 
equally strong and consistent patterns of factor loadings in 
the six different subgroups provided supporting evidence for 
the generality of the higher-order factor structure across 
different acdemic discipline and classroom instruction 
analyzed in this study. 

Given the adequate fit of the model positing three 
higher-order factors, a key issue of interest was whether 
these higher-order factors were well-defined and easily 
interpreted. A high residual variance in first-order 
factors would mean too much information were left 
unaccounted for by the higher-order factors. A high shared 
variance between two higher-order factors would be 
incompatible with the existence of the higher-order factors 
as distinct latent construct. The PSI matrix of first-order 
factor residual variances showed that for the total group 
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and subgroups about 20% of tfca variance in first -order 
factors were left unaccounted for by the positing higher- 
order factors. The shared variances of these higher-order 
factors were obtained using the results in the PHI 
standardized matrix of second-order factor correlations 
(Table 4) . 



Insert Table 4 about here 



The mean of higher-order factor correlations for the total 
group was .906, for academic subgroups .903, and for 
instructor subgroups .905. The square of these correlations 
provided a basis for estimating the amount of common shared 
variance of the higher-order factors. For the total group 
the estimated shared variance was 82%, for academic subgoups 
81%, and for instructor subgroups 82%. These extremely high 
shared variances suggested that the higher-order factors 
were not well differentiated as distinct latent construct of 
student ratings of teaching effectiveness. 

The results of this study have confirmed previous 
findings that the SEEQ specific dimensions of student 
ratings of classroom instruction could not be summarized in 
terms of a few composite scores without loss of much 
significant information. Even though three second-order 
factors have been consistently identified across a variety 
of classroom conditions, the second-order factors were 
accounted for about 80% of true score variance in the 
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underlying first-order factors. The very high 
intercorrelation between the second-order factors suggested 
that these factors could be underlied by a third higher- 
order x actor. This plausible alternative has not been 
explored in this study. 

The importance of higher-order factor in understanding 
of how specific dimensions of student ratings relate to the 
overall, quality of classroom teaching requires further 
inquiry. If higher-order factors are to be incorporated as 
substitution for the multitude of first-order factors into 
personnel decisions for ease of decision making, current 
knowledge in students evaluation of teaching effectiveness 
can be advanced with the application! of empirical assessment 
methods like higher-order analysis in testing model of 
theoretical interest . 
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Table 1 . 

Total group and subgroup samples 



I ns t rue t or subgroup 




Academic 


subgroup 




Soc 


Bus 


Eng 


Total 


Assistant Professor 


814 


858 


501 


2, 173 


Associate Professor 


981 


685 


388 


2, 054 


Full Professor 


952 


411 


732 


2, 095 


Total 


2, 747 


1, 954 


1, 621 


6, 322 



Note , Soc=Social Science, Bus=Business, Eng=Engineering . 



20 



Table 2 

Goodness-of - f it indices for SEEQ measurement model 



Total group and subgroups 



Tot Soc Bus Eng Full Asso Assi 



y2 29,388 12,642 11,145 7,362 9,911 10,748 10,401 

df 349 349 349 349 349 349 349 

GFI .740 .741 .705 .738 .738 .720 .730 

AQFI .676 .677 .632 .674 .674 .651 .664 

NFI .911 .909 .894 .912 .908 .902 .909 

TLI .897 .896 .880 .902 .896 .889 .897 



Note . Tot=Total group, Soc=Social Science subgroup, 
Bus=Business subgroup, Eng=Engineering subgroup, Full=Full 
Professor subgroup, Asso=Associate Professor subgroup, 
Assi=Assistant Professor subgroup. 
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Table 3 

Goodness-of -f it indices for models positing First-Order (FO) 
and Second-Order (SO) factors for total group 



Model 


Number of 




Goodness 


-of -fit 




factors 




indices 




FO 


SO 


df 


X 2 


TLI RNI 


H0 n 


8 


0 


377 


238, 123 




HI 


8 


1 


369 


33,264 


.859 .861 


H2 


8 


2 


368 


32, 324 


.862 .866 


H3 


8 


3 


366 


32 , 040 


.863 .867 



Note . HO n =Higher-Order null model in which all first-order 
factors are uncorrelated 
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Table 4 

Higher-order factor (HOF) correlations for total group and 
subgroups 



HOF 1 HOF 2 HOF 3 



Total group 

HOF 1 - 

HOF 2 .900 

HOF 3 . 943 . 874 



Social Science 

HOF 1 

HOF 2 .863 

HOF 3 . 941 . 894 



Business 

HOF 1 

HOF 2 .898 

HOF 3 . 956 .849 



Engineering 
HOF 1 / 
HOF 2 .921 
HOF 3 .941 .868 



HOF 1 HOF 2 HOF 3 



Full Professor e 

HOF 1 

HOF 2 .890 

HOF 3 .934 .856 



Associate Professor 

HOF 1 

HOF 2 .895 

HOF 3 . 946 . 881 



Assistant Professor 

HOF 1 

HOF 2 .917 

HOF 3 . 949 . 881 



23 




rACIOM J 



CI I AMNINCl \_ 




f ^SSlOMMGNT 
V READING 




Figure 1 . Three second-order factor models 
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Appendix A 

Students' Evaluations of Educational Quality (SEEQ) 

Survey Form 
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